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A METHOD OF, AND SYSTEM FOR 
DETECTING MASS MAILING VIRUSES 

The present invention relates to a method o^ and system for, detecting mass 
mailing viruses. 

The internet and local- and wide-area networks are susceptible to ttie " 
exploits.of mass mailing viruses. Typically, these viruses involve an email with an 
executable attachment which, when it executes, causes more"vinis-containing emails to be 
created and sent, flooding the network with traffic and its email users with xmwanted 
emails. 

These mass mailing viruses have become increasingly sophisticated: early 
forms of them chose the addressees from the name and address book associated with the 
recipient's email client, while more recent forms use a variety of techniques to gather 
addresses. 

As the number of mass mailing viruses has grown, the authors of anti-virus 
scanning systems have had to enhance their systems to try and keep up with the threat. 
One tried and tested technique for detecting viruses is "signature scanning", where a file, 
an executable attachment in the case of email, is scanned for signatures, i.e. sequences, or 
patterns of sequences; of bytes which have been identified as characteristic of particular 
viruses. However, signature-based scanning is not particularly effective for dealing with 
mass mailing viruses, because the time taken for the virus to do its work and cause copies 
of itself to be sent is small compared with the time it takes for anti-virus software houses to 
disseminate updates to their system to deal with it when an outbreak of a new virus occurs. 

. This is particularly a problem where the anti-virus service is being operated on behalf of a 
large number of users, as may be the case where an ISP (Internet Service Provider) cairies 
out anti-virus scanning, of email and other files in transit, on behalf of customers as a 

value-added service. 

The present invention is based upon an ^preciation of the fact that 

concentrating on executable attachments overlooks a fertile source of viral-indicating 

information, namely the email itself and operates by carefiilly considering the whole email, 

rather than just the attachments. 

According to the present invention, tiiere is provided a method of anti-virus 

processing an email having one or more executable attachments comprising the steps, 

executed by a machine, of: 

a) extracting structural elements from the email; 
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b) examining the executable attachments for code, data or encoded data 
tiiat could have created the structural elements extracted earlier; and 

c) examining signalling that the attachment is possibly viral or not on the 
basis of the extent to which the examining step b) finds evidence that the structural 

5 elements have been created 

The invention also provides a system for anti-virus processmg an email 
having an executable attachment comprising the following means, implemented by a 
machine: 

a) means for extracting structural elements from the email; 
10 b) means for examining the executable attachments for code, data or 

encoded data that could have created the structural elements extracted earlier; and 

c) . means for signalling that the attachment is possibly viral or not on.the 
basis of the extent to which the examining step b) finds evidence that the structural 
elements have been created by that attachment 
15 The invention will be fiirther described by way of non-limitative example 

with reference to the accompanying drawing in which: 

Figure 1 illustrates one embodiment of system according to the present 

invention. 

The system 100 operates on emails arriving at an input 101 and processes 
20 each one to signal either at an output 102 that the system regards that email as non-viral or 
at an output 103 fliat the system regards it as viral or possibly viral. The system may be 
operated as a stand-alone system, or as part of a larger anti virus system either as the 
module with responsibility for processing e-mails or in conjunction with additional sub- 
systems which apply additional virus-detection heuristics to emails which it has signalled 
25 as possibly viral. 

Most prior virus scanners scan email by taking the email, extracting the 
attachments, and then scanning them for malware. The system 100, on the other hand, 
operates on the basis that by carefiiUy considering the email as a whole, rather than just the 
attachments, it is possible to greatly increase the chances of detecting mass mailing 
30 viruses. 

Each email client creates emails in its own unique way, producing what one 
might term an email ^fingerprint' which is discernible in emails created by it. By 
examining the structure of emails it is possible with some certainty to say, for instance, that 
a particular email was created by Microsoft Outlook, or Lotus Notes, or Eudora. Mass 
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mailing viruses are also one form of email client, since they genierate emails, and they will 
create emails with a particular fingerprint. By carefully examining tiie executable 
attachment and comparing it with the fingerprint of the email it is contained in, it is 
possible to say with some certainty whether the attachment generated the actual email, and 
5 this is a very good sign that the attachment is a mass mailing virus. 

The system 100 operates according to the following algorithm: 

1) A 'gatherer* 104 takes the email firom input 101, and creates 
fingerprint information about flie email structure. 

2) An ^extractor' 105 extracts the next attachment firom the email. If 
10 there are no more attachments lefi, processing stops. 

3) An 'analyser/matcher* 106 analyses the attachment to see if it is likely 
that the attachment created the email by comparing with the fiaigerprint information. If it is 
not likely that the attachment created the email, return to step 2. 

4) An 'exception checker' 107 checks for known exceptions. If an 
15 exception list match is found, retum to step 2. 

5) The email is flagged, at ou^ut 103 as possibly containing a mass 

mailing virus. 

The gatherer 104 parses the email, searching for structural information. For 
example, this could include (but is not limited to) the following: 
20 ^ • Standard Mime headers created 

• Unusual Mime headers created 

• Deviations from RFC standards (for example, missing out the final 
MIME boundary) 

• Unusual construct which are legal according to the RFC standards, but 
25 which are not generally used by mainstream email clients - eg unusual 

capitalisation of MIME headers; comment fields used in certain 
MIME headers where they are not normally used; nested comment 
. fields 
Number of attachments 
30 • Type of attachments 

Encoding method used for attachments 
Text content of the email 
HTML/XHTML content of the email 




A simple implementation of die extractor lOS parses the email, presenting 
attachments in turn to the analyser/matcher 106. This can be improved by recursively 
- analysing compound attachments. For instance, if the attachment is an archive such as .a 
ZIP file, it will extract each file, presenting these in turn to the analyser/matcher 106. If 

5 these files are also archives, these will also be extracted in turn and so on until no more 

extraction can be done. Files packed using packers such as UPX or ASPack can be 
unpacked. Self-extracting execufables can have the files they contain extracted. 

The extractor 105 can also have extra logic to detect situations where a 
malicious attacker tries to attack the system by sending files that extract to extremely large 
1 0 sizes, or that take a very long time to extract 

The analyser/matcher 105 analyses the file to try and determine whether it 
contains code that creates emails matching the fingerprint created by the gatherer. Each 
match-type is assigned a certain score. For instance, a match of a particular deviation firom 
an RFC standard may score X, and a match of the text content of the email may score Y. 
1 5 Scores are added together and if they pass a set value, this is deemed a 

match. 

The analyser/matcher 106 can be incredibly simple. For instance, an 
analyser that merely extracts strings from a file using the standard linux * strings' command 
(which returns the strings of printable characters in a file), and matches these against the 
20 text content of the email, is sufficient to detect most mass mailing viruses currently in 
existence. 

Of course, the more complex the analyser/matcher 106 is, the better the 
detection rate will be. For instance, cryptographic routines can be added to detect encoded 
email text, as used by the W32/Kle2.H virus. Code analysing routines can be added to 

25 search for eniail specific routines, and work out how any email is created, and so on. 

The exception checker 107 contains rules to filter out false positives that 
have occurred in the past. For instance, if someone uses the Eudora email client to mail a 
copy of Eudora to a fiiend, the fingerprints will match fee attachment causing the email to 
be treated as viral. This can be overcome by for instance, creating an MD5 checksum of 

30 the attachment, and comparing to a list of known MD5 checksums for standard mail 
clients; the exception checker can then recognise these known email clients and ignore 
them. MD5 is an' example of a checksum or hashing method sensitive enough to detect 
wheflier even a single bit in the data firom which the checksum is derived has been 
changed. 



As noted above, tiie system 100 can be used as a stand-alone virus detection 
algorithm, or combined with others implementing other virus-detection techniques as part 
of a larger system. For instance, files flagged as mass mailing viruses by fliis metiiod may 
be allocated a certain score, or variety of scores depending which tests pass and fail. File 
which score some matches, but not enough for tiie matcher to flag as a mass mailing virus 
may be assigned a lower score. 

These are then combined with scores firom other heuristic techniques, and 
only if the total score passes some limit is the file flagged as viral. 

Code analysis can also be used by the analyser/matcher 1 06. For instance, 
the MIME headers in a mail will be present in a certain order, and if it can be ascertained 
by code analysis that the attachment creates em^s with the MIME headers in that exact 
order, then this is a very good sign that the attachmeiit created the email. A simple 
implementation can be achieved by finding references in the code to the data areas used to 
construct the emaU, and then noting the order in which these references occur. 

Virus writers often encode parts of the data area of their programs in order 
to try and hide what they are doing. Attempts can be made in various ways to match parts 
of the email to data which is encrypted in the attachment. For instance, if the email 
contains the text: 

we seem to sew 
this might be encoded as 

er drrz yp dre 
First we number the string 

12345678901234 
we seem to sew 

Then we note that the letter V in the original occurs in the 1st and 14th 
positions in the original. We also note the letter 'e' in the potential encoded string also 
occurs in the 1st and 14th positions, and in no other position. 

Next we note that the letter 'e' in the original occurs in the 2nd, 5th, 6th and 
13th position in the original. We also note the letter V in the potential encoded string also 
occurs in the 2nd, 5th, 6th and 13th positions, and in no other position. 

Thus, the potential encoded string so far seems to be the same as the original 
string, but with the letter 'e' replacing the letter 'w' and letter V replacing the letter 'e*. If we 
can repeat this test successfiiUy for each different character in the original string, then it is 
highly likely that the second string really is an encoded version of the original string. 



The analyser/matcher 106 may be arranged to execute algorithms such as 
the one just described in order to find evidence that the attachment could have created 
structural elements of the email. 



CLAIMS 

^- A method of anti-virus processing an email having an executable attachment 

comprising the steps, executed by a machine, of: 

a) extracting structural elements from the email; 

b) examining the executable attachments for code, data or encoded data 
that could have created the structural elements extracted earlier; and 

c) signalling that the attachment is possibly viral or not on the basis of the 
extent to which the examining step.b) finds evidence that the structural elements have been 
created by that .attachment 

2- -A method according to claim 1, wherein the structural elements are 
categorised and the step c) includes assigning a numeric score for each element which 
could have been created by that attachment, and signalling that the attachment is possibly 
viral or not on the basis of an overall score. . 

3- A method according to claim 2, wherein the scores are weighted* according 
to category. 

4- A method according to any one of the preceding claims, wherein the 
signalling step c) takes account of factors including any or all of the foUowing attributes of 
the email: 

standard MIME headers; 
xmusual MIME headers; 
deviations from RFC standards; 
unusual constructs; 
number of attachments; 
type of attachments;. 

encoding method used for attachm^ts; " 

text content of the email; and 

HTML or XHTML content of the email. • 



5. A method according to any one of claims 1 to 4 wherein the step a) includes 

extracting the structural elements as strings, tiie step b) includes examining the attachments 



for matches of those strings and the step c) signals the attachment as possibly viral or not 
on tiie basis of the extent to which tiie examining step b) finds occurrences of the strings in 
tiiie attachment 

6^ A system for anti-virus processing an email having an executable 
attacbment-^empHsing-the4QMowing-means,Jmplfi^ 

a) means for extracting structural elements from the email; 

b) means for examining the executable attachments for code, data or 
encoded data that could have created the structural elements extracted earlier; and 

c) means for signalling tiiat the attachment is possibly viral or not on the 
basis of the extent to which the examining step b) finds evidence that the structural 
elemdats have been created by that attachment. 

7 A system according to claim 6, wherein the structural elements are 

categorised and the means c) includes means for assigning a numeric score for each 
element which could have been created by fliat attachment, and signalling that the 
attachment is possibly viral or not on the basis of an overall score. 

8. A system according to claim 7, wherein the scores are weighted according 
to category. 

9. . A system according to any one of claims 6 to 8, wherein the signalling step 
c) takes account effectors including any or all of the following attributes of the emafl: 

standard MIME headers; 

\musual MIME headers; 

deviations from RFC standards; 

unusual constructs; 

number of attachments; 

type of attachments; 

encoding method used for attachments; 

text content of the email; and 

HTML or XHTML content of the email. 



10- ^ system according to any one of claims 6 to 9 wherein the means a) 
includes extracting the structural elements as strings, the means b) includes examining the 
attachments for matches of those strings and the means c) signals the attachment as 
possibly viral or not on the basis of the extent to which the examining means b) finds 
occurrences of the strings in the attachment. 

11- A method of anti-virus processing an email having an executable attachment 
substantially as hereinbefore described and with reference to the accompanying drawings 
claim. 
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12. A system for anti-virus processing an email having an executable 

attachment substantially as hereinbefore described and with reference to the accompanying 
drawing. 




ABSTRACT 



A system for anti-virus processing an email having an executable 
attachment extracts structural elements of the email and examines the executable 
5 attachments for code, data or encoded data ttiat could have created these elements. This is 

fiff^rtivft tn Hfttftr. t at least some mass mailing viruses where the executable attachment 

creates later generations of the attachment and structural elements such as strings which 
appear in the later emails are present in the attachment. 
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